BioSeek: Exploiting Source-Capability Information for Integrated Access to Multiple Bioinformatics Data Sources

نویسندگان

  • Ling Liu
  • David Buttler
  • Terence Critchlow
  • Wei Han
  • Henrique Paques
  • Calton Pu
  • Daniel Rocco
چکیده

Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely nding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain sourcecapability pro les independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source pro les. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach o ers ne-granularity of interest matching, thus it is more powerful and e ective for handling queries with complex con-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GDPC: connecting researchers with multiple integrated data sources

UNLABELLED The goal of this project is to simplify access to genomic diversity and phenotype data, thereby encouraging reuse of this data. The Genomic Diversity and Phenotype Connection (GDPC) accomplishes this by retrieving data from one or more data sources and by allowing researchers to analyze integrated data in a standard format. GDPC is written in JAVA and provides (1) data sources availa...

متن کامل

Ontology-based integration for bioinformatics

Information integration systems support researchers in bioinformatics to retrieve data from multiple biological data sources. In this paper we argue that the current approaches should be enhanced by ontological knowledge. We identify the different types of ontological knowledge that are available on the Web and propose an approach to use this knowledge to support integrated access to multiple b...

متن کامل

Making whole genome multiple alignments usable for biologists

SUMMARY Here we describe a set of tools implemented within the Galaxy platform designed to make analysis of multiple genome alignments truly accessible for biologists. These tools are available through both a web-based graphical user interface and a command-line interface. AVAILABILITY AND IMPLEMENTATION This open-source toolset was implemented in Python and has been integrated into the onlin...

متن کامل

Integrating Biological Data and Tools with Bis

|The access and exploitation of integrated data repositories and applications is critical for life science. Biologists design protocols that typically rely on complex query pipelines accessing various biological electronic resources (data sources and tools) to consistute data sets for analysis and mining. Integration platforms are needed to allow biologists to acces, manipulate and analyze elec...

متن کامل

The bioinformatics resource for oral pathogens

Complete genomic sequences of several oral pathogens have been deciphered and multiple sources of independently annotated data are available for the same genomes. Different gene identification schemes and functional annotation methods used in these databases present a challenge for cross-referencing and the efficient use of the data. The Bioinformatics Resource for Oral Pathogens (BROP) aims to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003